1
0
Commit Graph

1724 Commits

Author SHA1 Message Date
Andrew Waterman
d811692c3b Mitigate I$->D$->I$ critical path
This seemingly irrelevant change shaves several gate delays off the I$
tl.a.valid path.
2017-07-31 01:43:04 -07:00
Andrew Waterman
ac4339a8e7 Pass D$ backpressure to D-channel, rather than asserting 2017-07-29 11:48:36 -07:00
Andrew Waterman
edcd2c696c Avoid needless stall on E-channel back pressure 2017-07-29 11:47:58 -07:00
Wesley W. Terpstra
8e2e931770 Merge pull request #903 from freechipsproject/monitor-probes
tilelink: use the Monitor to enforce Probe sourcing
2017-07-29 01:12:08 -07:00
Wesley W. Terpstra
56e28026a6 TLError: does not need to be fast; cut the loop
The SystemBus already has a flow buffer on outputs.
2017-07-29 00:22:21 -07:00
Wesley W. Terpstra
540256e24a systembus: all slaves should have an output buffer 2017-07-29 00:13:33 -07:00
Wesley W. Terpstra
eadf4e9fcc Revert "tile: add option for tile boundary buffers"
This reverts commit b64b87ad07.

The crossings already have buffering in those places where it was
appropriate. Adding more does not help flow through paths.
2017-07-29 00:03:24 -07:00
Wesley W. Terpstra
68064ba260 systembus: don't double down on buffers
The order should be:
  master => buffer|xing => fifofixer => splitter => xbar
2017-07-29 00:02:12 -07:00
Yunsup Lee
140086e2c5 Merge pull request #902 from freechipsproject/perf-improvements
Perf improvements
2017-07-28 20:12:10 -07:00
Wesley W. Terpstra
a0db929003 tilelink: use the Monitor to enforce Probe sourcing 2017-07-28 18:08:00 -07:00
Megan Wachs
573890e102 Merge pull request #900 from freechipsproject/more_verbose_requires
diplomacy: More verbose require
2017-07-28 13:23:33 -07:00
Andrew Waterman
fdb8935712 Improve fidelity of two perf counters 2017-07-28 13:14:04 -07:00
Andrew Waterman
4c82f6b77e Don't refill BTB on not-taken branches 2017-07-28 13:13:52 -07:00
Andrew Waterman
2e8b02e780 Merge D$ store hits when ECC is enabled
This avoids pipeline flushes due to subword WAW hazards, as with
consecutive byte stores.
2017-07-28 12:56:36 -07:00
Andrew Waterman
838864870e Bypass TLB refill signal to halve L2 TLB hit time
The 4-cycle hit time is 1 cycle too long to avoid a second
pipeline replay, so it was effectively 9 cycles instead of 4.
2017-07-28 12:56:36 -07:00
Andrew Waterman
ae1f7a95f6 Don't nack misses when there's a pending store
That effectively increased the miss latency by 5 cycles when there was
a store hit followed by a load miss.  Since pending stores are drained
when releaseInFlight, the check I removed was redundant.
2017-07-28 12:56:36 -07:00
Henry Cook
7eeb9dfd88 Merge pull request #899 from freechipsproject/wrapper-dedup
Stabilize tile wrappers for downstream tools
2017-07-28 10:52:59 -07:00
Megan Wachs
f61fe2be1e diplomacy: More verbose require 2017-07-28 10:05:45 -07:00
Wesley W. Terpstra
5f81c2243f tilelink: add BusBypass, useful to turn devices off 2017-07-27 20:16:30 -07:00
Wesley W. Terpstra
9a36755b6a tilelink: CacheCork uses constructor helpers 2017-07-27 18:38:15 -07:00
Wesley W. Terpstra
45189c3e30 tilelink: CacheCork now supports errors and BtoT upgrade
- Acquire.BtoT succeeds with toT instantly
- AccessAckData.error causes Grant.toN.error
2017-07-27 18:38:13 -07:00
Wesley W. Terpstra
2e4f1611ed tilelink: Error device supports Acquire
We need this if we want to divert traffic to it from a TL-C slave.
2017-07-27 18:32:58 -07:00
Henry Cook
b64b87ad07 tile: add option for tile boundary buffers 2017-07-27 17:30:51 -07:00
Henry Cook
289ef30dbc coreplex: change AsynchronousCrossing.sync default to 3 2017-07-27 15:44:51 -07:00
Henry Cook
266ed56e8d tile: turn off more slave port monitors 2017-07-27 15:28:53 -07:00
Henry Cook
9a483af6e8 coreplex: naming of tile wrappers 2017-07-27 15:16:48 -07:00
Henry Cook
33852ef965 coreplex: remove superfluous sink and source from wrapper 2017-07-27 14:23:03 -07:00
Wesley W. Terpstra
651da73d89 tilelink: it is now legal to support Acquire for UNCACHED regions
These cases exist:
  GET_EFFECTS, PUT_EFFECTS, UNCACHEABLE && !supportsAcquire: MMIO
  UNCACHED && !supportsAcquire: speculation ok and may be cached
  UNCACHED && supportsAcquire: LLC/CacheCork applied (slave never probes)
  CACHED, TRACKED && supportsAcquire: slave might probe
2017-07-27 11:11:22 -07:00
Wesley W. Terpstra
0ab5cb67b3 tilelink: fix RAMModel handling of AMOs on early source reuse (#897) 2017-07-27 11:07:13 -07:00
Wesley W. Terpstra
9804bdc34e tilelink: remove obsolete addr_lo signal (#895)
When we first implemented TL, we thought this was helpful, because
it made WidthWidgets stateless in all cases. However, it put too
much burden on all other masters and slaves, none of which benefitted
from this signal. Furthermore, even with addr_lo, WidthWidgets were
information lossy because when they widen, they have no information
about what to fill in the new high bits of addr_lo.
2017-07-26 16:01:21 -07:00
Wesley W. Terpstra
d096d5d1c4 tilelink: fix AtomicAutomata bug wrt early source reuse
The new fuzzer already found it's first victim.
2017-07-26 12:52:29 -07:00
Wesley W. Terpstra
6550ae2e31 tilelink: increase Fuzzer source reuse aggression 2017-07-26 12:37:31 -07:00
Wesley W. Terpstra
1efdca106c tilelink: RAMModel support early reuse of source 2017-07-26 12:37:31 -07:00
Wesley W. Terpstra
138276fd87 tilelink: SourceShrinker should work also for 0 latency 2017-07-26 12:37:31 -07:00
Wesley W. Terpstra
b2edca2a6b tilelink: cut WidthWidget from dependency on addr_lo 2017-07-26 10:31:09 -07:00
Wesley W. Terpstra
ede87c1f73 tilelink: rewrite WidthWidget beat splitter
- split the data based on the address, not the mask
  (the first version of TileLink did not have low address bits)
- the dependency on addr_lo is now exposed and easy to replace
2017-07-26 10:24:16 -07:00
Wesley W. Terpstra
0f5065fbf3 tilelink: WidthWidget rewrite beat merging
- errors are properly OR reduced
- registers latched only as needed (was previously a shift register)
- combines beats without inspecting address (removes addr_lo dependency)
2017-07-26 10:24:12 -07:00
Wesley W. Terpstra
f0ffb7e31e tilelink: initialize toggle in Fragmenter (#894)
No strictly necessary, because the initial value does not matter, but good hygiene since it drives a cycle of logic.
2017-07-26 10:21:31 -07:00
Andrew Waterman
5a5b78b15e Improve L2 TLB coding style 2017-07-26 02:22:43 -07:00
Andrew Waterman
5a9c673f41 Fix L2 TLB response bug
Sometimes, it would inform the L1 TLB that the translation was for
a superpage, even though that's never the case.
2017-07-26 02:20:41 -07:00
Andrew Waterman
acca0fccf5 Fix BTB not being refilled on some indirect jumps
We are overloading the BTB-hit signal to mean that any part of the frontend
changed the control-flow, not just the BTB.  That's the right thing to do for
most of the control logic, but it means the BTB sometimes won't get refilled
when we'd like it to.  This commit makes the frontend use an invalid BTB entry
number when it, rather than the BTB, changes the control flow.  Since the
entry number is invalid, the BTB will treat it as a miss and refill itself.

This is kind of a hack, but a more palatable fix requires reworking the RVC
IBuf, which I don't have time for right now.
2017-07-26 02:13:43 -07:00
Yunsup Lee
6916e5cbfb coreplex: better names for RocketTiles in Verilog (#890) 2017-07-25 16:35:31 -07:00
Andrew Waterman
d43f02268b Merge pull request #889 from freechipsproject/acq-before-rel-and-jump-in-frontend
Acquire before release; jump in frontend
2017-07-25 16:26:47 -07:00
Wesley W. Terpstra
c2b8b08461 tilelink: fix Fragmenter source re-use bug (#888)
Consider the following waveform for two 4-beat bursts:
---A----A------------
-------D-----DDD-DDDD
Under TL rules, the second A can use the same source as the first A,
because the source is released for reuse on the first response beat.

However, if we fragment the requests, it looks like this:
---3210-3210---------
-------3-----210-3210
... now we've broken the rules because 210 are twice inflight.

To solve this, we alternate an a.source bit every time D completes a txn.
2017-07-25 16:23:55 -07:00
Andrew Waterman
15878d4691 Perform some control-flow transfers within the Frontend 2017-07-25 15:19:16 -07:00
Andrew Waterman
62c4080585 Add RVC instruction patterns 2017-07-25 15:19:16 -07:00
Andrew Waterman
66d06460fa Add option for acquire-before-release 2017-07-25 15:19:16 -07:00
Andrew Waterman
86ccd935fc Add method to print perf events 2017-07-25 15:19:16 -07:00
Andrew Waterman
5df8f0d1ea Add L2 TLB miss counter 2017-07-25 15:19:16 -07:00
Andrew Waterman
3ced04b70a Mix in trait to connect global_reset_vector 2017-07-25 15:19:16 -07:00